Cai et al. BMC Bioinformatics 2013, 14(Suppl 12):S2 
http://www.biomedcentral.eom/1 471 -2 1 05/1 4/S1 2/S2 



Bioinformatics 



RESEARCH Open Access 



A novel subnetwork alignment approach predicts 
new components of the cell cycle regulatory 
apparatus in Plasmodium falciparum 

Hong Cai 1+ , Changjin Hong 2+ , Timothy G Lilburn 3+ , Armando L Rodriguez 1,4 , Sheng Chen 2 , Jianying Gu 5 , 
Rui Kuang 2 *, Yufeng Wang 1,6 " 

From IEEE International Conference on Bioinformatics and Biomedicine 2012 
Philadelphia, PA, USA. 4-7 October 2012 



Abstract 

Background: According to the World Health organization, half the world's population is at risk of contracting 
malaria. They estimated that in 2010 there were 219 million cases of malaria, resulting in 660,000 deaths and an 
enormous economic burden on the countries where malaria is endemic. The adoption of various high-throughput 
genomics-based techniques by malaria researchers has meant that new avenues to the study of this disease are 
being explored and new targets for controlling the disease are being developed. Here, we apply a novel 
neighborhood subnetwork alignment approach to identify the interacting elements that help regulate the cell 
cycle of the malaria parasite Plasmodium falciparum. 

Results: Our novel subnetwork alignment approach was used to compare networks in Escherichia coli and P. 
falciparum. Some 574 P. falciparum proteins were revealed as functional orthologs of known cell cycle proteins in £ coli. 
Over one third of these predicted functional orthologs were annotated as "conserved Plasmodium proteins" or "putative 
uncharacterized proteins" of unknown function. The predicted functionalities included cyclins, kinases, surface antigens, 
transcriptional regulators and various functions related to DNA replication, repair and cell division. 

Conclusions: The results of our analysis demonstrate the power of our subnetwork alignment approach to assign 
functionality to previously unannotated proteins. Here, the focus was on proteins involved in cell cycle regulation. These 
proteins are involved in the control of diverse aspects of the parasite lifecycle and of important aspects of pathogenesis. 



Background the disease is extremely variable from one region to 

Written descriptions of the symptoms of malaria have another; generally, the regions with the highest incidence 

existed for over 4,000 years and evidence for the existence of malaria also have the weakest mechanisms for reporting 

of the genus Plasmodium has been recovered from amber and recording cases. 

approximately 30 million years old [1]. Thus, the disease Malaria is caused by protozoan parasites from the Genus 
has probably evolved alongside its hosts since the emer- Plasmodium. Different species tend to infect different host 
gence of the first humans in Africa. In 2010, it was esti- species. Five species infect humans; the two most wide- 
mate that 660,000 people died from malaria. This estimate spread species are P. vivax and P. falciparum. The latter 
probably represents a conservative number, as reporting of species is the most lethal. P. falciparum has a complex life 

cycle that spans the arthropod vector and human host. 
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parasite first infects the liver. After maturation in the liver, 
the parasite infects red blood cells. In this so-called RBC 
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A number of antimicrobial drugs have been developed 
over the years, notably chloroquine and artemisinin. How- 
ever, in the past decades, the effectiveness of all these 
drugs has been significantly reduced due to the evolution 
of drug-resistant parasites, with the exception of artemisi- 
nin. Recently, however, evidence has emerged that resis- 
tance to artemisinin has appeared and is beginning to 
spread. Therefore, it is essential that new drug targets be 
identified and the development of new genomics-based 
technologies is key to this task. Genome sequences from 
P. falciparum [2] and other Plasmodium spp. [3-7] have 
been completed and these have facilitated numerous stu- 
dies on, for example, parasite transcription [8-19], transla- 
tion [20-29], metabolism [30-34], protein-protein 
interactions [35-38], and epigenetic regulation [39-42]. 
The data from these studies have, in turn, laid the ground- 
work for systems biology oriented studies of the networks 
associated with parasite development, survival, pathogen- 
esis, and virulence [43-46]. 

Network alignment is a popular systems biology 
method [47-55] . However, because the malaria parasite is 
only distantly related to other, more completely under- 
stood model organisms, the utility of this approach may 
be cast in doubt. About 60% of the open reading frames 
in P. falciparum are annotated as "hypothetical proteins" 
[2] simply because homology transfer of information 
about individual proteins is not possible across extended 
evolutionary distances. To tackle this problem, we 
recently developed a neighborhood subnetwork align- 
ment algorithm [56], which is focused on the similarities 
between functional modules, in other words, on the 
interactions among proteins rather than on individual 
proteins. We define a neighborhood subnetwork as the 
set of nodes (proteins) reachable from a central protein 
via a small number of edges in a protein-protein interac- 
tion (PPI) network. A proof-of-concept study predicted 
previously unrecognized transcriptional regulators 
involved in diverse facets of the parasite life cycle [43]. In 
this paper, we use the subnetwork alignment approach to 
uncover candidate proteins with roles in cell cycle regula- 
tion, several of which are potential drug targets. As our 
knowledge of the mechanics of the cell cycle deepens, so 
will our ability to influence parasite survival in the host 
and our ability to identify key drug targets. 

Results and discussion 

Neighborhood subnetwork alignments predicted 574 
proteins that are associated with cell cycle regulation in 
malaria parasite 

The cell cycle of the malaria parasite differs significantly 
from that of other model eukaryotic organisms. There is 
no direct correspondence between schizogony, during 
which the parasite undergoes multiplication, and the typi- 
cal Gl, S, G2 and M phases of the cell cycle in crown 



eukaryotes. In addition, the parasite's cell cycle features 
asynchronous nuclear divisions, organellar segregation, 
and morphogenesis of daughter merozoites. A thorough 
sequence similarity-based search by Doerig and Chakra- 
barti predicted a list of proteins that might be involved in 
the cell cycle [57], including cyclins, cyclin-dependent 
kinases, proteins critical for cell division and signal trans- 
duction. In a previous study, we used a variational Baye- 
sian expectation maximization (VBEM) approach to reveal 
the dynamics of the parasite cell cycle network, and to 
infer regulatory relationships based on time-series tran- 
scriptomic data [58]. The results from that study exposed 
gaps in our cell cycle network model. Here we use our 
subnetwork alignment approach to try and fill these gaps. 

We predicted that 574 proteins in P. falciparum were 
functional orthologs of known cell cycle proteins in E. coli 
(Additional File 1). Over 34% of these predicted functional 
orthologs were annotated as "conserved Plasmodium pro- 
teins" or "putative uncharacterized proteins" of unknown 
function. 

The set of functional orthologs is involved in key 
biological processes 

Table 1 shows representative functional categories pre- 
dicted for the cell cycle-associated protein set as revealed 
by Gene Ontology (GO) enrichment analysis. These func- 
tional categories are part of some of the most important 
mechanisms governing the growth and survival of the 
parasite. Some of the more interesting functional groups 
are discussed in the following sections. 

1. Cyclin 

Our subnetwork alignment approach predicted PFL1330c 
to be a putative cyclin [58]. Cyclins are a family of proteins 
with expression levels that oscillate during the cell cycle; 
the synthesis and degradation of cyclins control the activ- 
ity of cyclin-dependent kinases and accurate transition of 
key cell cycle points. Yeast two-hybrid (Y2H) experiments 
[37] have shown that PFL1330c has physical interaction 
with an apical sushi protein (ASP) (PFD0295c), which has 
an adhesive "sushi" domain and thought to have a role in 
the merozoite invasion process. 

2. Kinases 

Signal transduction plays a key role in managing the 
complexity of the cell cycle [59,60]. Figure 1 shows eight 
kinases (in yellow) that were predicted by the subnetwork 
alignments and the proteins that are directly associated 
with them. Three protein kinases have been implicated in 
cell cycle regulation: 

(1) PfMAPl (PF14_0294) is a homolog of mitogen-acti- 
vated protein kinase (MAPK) [61]. This kinase is believed 
to be a central member of the MAPKKK cascade and may 
be related to parasite responses to a variety of exogenous 
or endogenous stimuli or environmental stresses. PfMAPl 
has three PPI partners: (a) a serine/threonine protein 
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Table 1 Representative P. falciparum proteins that were predicted to be involved in cell cycle regulatory network. 



Functional category 


PlasmoDB accession number 


Annotation 


Cyclin 


PFL1330c 


Cyclin-related protein, Pfcyc-2 


Cell differentiation 


PFE0375W 


cell differentiation protein, putative (CAF40) 


Chromosome organization 


PFE0450w 


Chromosome condensation protein, putative 




PF11_0062 


Histone H2B 


Mitosis 


PF13_0050 


HORMA domain protein, putative 


DNA repair 


MAL7P1.145 


Mismatch repair protein pmsl homologue, putative; 




PF1 0_01 14 


DNA repair protein RAD23, putative 




PF08_0126 


DNA repair protein rad54, putative 


DNA replication 


PF07_0023 


DNA replication licensing factor mcm7 homologue, putative 




PFL0580W 


DNA replication licensing factor MCM5, putative 




MAL7P1.21 


Origin recognition complex subunit 2, putative 




PFE1345C 


Minichromosome maintenance protein 3, putative 


Regulation of cell cycle 


PF07_0047 


AAA family ATPase, CDC48 subfamily <Cdc48) 




PFL1925W 


Cell division protein FtsH, putative 


Protein phosphorylation 


PFC0105W 


Serine/threonine protein kinase, putative 




MALI 3P1. 278 


Serine/threonine protein kinase, putative 




PF14_0294 


Mitogen-activated protein kinase 1 




PFC0755C 


Protein kinase, putative 




PF11_0464 


Ser/Thr protein kinase, putative 




PF11_0156 


Ser/Thr protein kinase 




PF11_0239 


Calcium-dependent protein kinase, putative 




PFL1370W 


NIMA-related protein kinase, Pfnek-1 


Proteolysis 


PF14_0517 


Peptidase, putative 




MALI 3P1. 184 


Endopeptidase, putative 




PFL1635W 


Ulp1 protease, putative 




PF10_0150 


Methionine aminopeptidase 


Cytoskeleton 


MAL8P1.146 


filament assembling protein, putative 


Heat shock 


PFI0875W 


Heat shock protein 70 (HSP70) homologue 




PFL0565W 


Heat shock protein DNAJ homologue Pfj4 




PF11_0351 


Heat shock protein hsp70 homologue 




PF11_0188 


Heat shock protein 90, putative 




PF07_0029 


Heat shock protein 86 




PF08_0054 


Heat shock 70 kDa protein 




PFB0595w 


Heat shock 40 kDa protein, putative 




PFI0355C 


ATP-dependent heat shock protein, putative 


Pathogenesis 


PFC0005W 


Erythrocyte membrane protein 1, PfEMPI 




PFI0005W 


Erythrocyte membrane protein 1, PfEMPI 




PFD0005W 


Erythrocyte membrane protein 1, PfEMPI 




PF08_0103 


Erythrocyte membrane protein 1, PfEMPI 




PFL0935C 


Erythrocyte membrane protein 1, PfEMPI 




PFL0005W 


Erythrocyte membrane protein 1, PfEMPI 




PFB1055C 


Erythrocyte membrane protein 1, PfEMPI 




PFI 1830c 


Erythrocyte membrane protein 1, PfEMPI 


Microtubule cytoskeleton organization and activity 


PFC0165w 


Spindle pole body protein, putative 




PF07_0104 


Kinesin-like protein, putative 


Transcriptional regulation 


PF10_0143 


Transcriptional coactivator ADA2 (ADA2) 




PFD0985w 


AP2/ERF domain-containing protein PFD0985w 




PFL1085W 


Transcription factor with AP2 domain, putative 




PF11_0442 


Transcription factor with AP2 domain, putative 
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Table 1 Representative P. falciparum proteins that were predicted to be involved in cell cycle regulatory network. 

(Continued) 



PFE0840C 


Transcription factor with AP2 domain, putative 


PF07_0126 


Transcription factor with AP2 domain, putative 


PF10_0075 


Transcription factor with AP2 domain, putative 


PFL1900W 


Transcription factor with AP2 domain, putative 


PFL0465C 


Zinc finger transcription factor (Kroxl) 



PF11 0218, 



PFD0265 

PF1 1 0069 ! 



0290 




PI .132 



Information Storage 
and Processing 

□□□□□ 



0212 



Cellular Processes 
and Signaling 



Metabolism 



MAL13P1.278 PFBT055C 
PFC0755C PFD0385C 



Poorly 
Characterized 



□ □□□□□□□□ CDI 



K 



B 



M N 



W U 



H 



P Q 



S None 



Figure 1 A graph showing the proteins associated with kinases predicted to be involved in cell cycle regulation Square nodes represent the 
kinases. Node size is proportional to the degree of the connectivity of the node. Nodes are colored according to their functional classification in the 
eggNOG database [79]. The COG categories are [80] (J) Translation, ribosomal structure and biogenesis, (A) RNA processing and modification, 
(K) Transcription, (L) Replication, recombination and repair, (B) Chromatin structure and dynamics, (D) Cell cycle control, cell division, chromosome 
partitioning, (Y) Nuclear structure, (V) Defense mechanisms, (T) Signal transduction mechanisms, (M) Cell wall/membrane/envelope biogenesis, (N) Cell 
motility, (Z) Cytoskeleton, (W) Extracellular structures, (U) Intracellular trafficking, secretion, and vesicular transport, (0) Posttranslational modification, 
protein turnover, chaperones, (C) Energy production and conversion, (G) Carbohydrate transport and metabolism, (E) Amino acid transport and 
metabolism, (F) Nucleotide transport and metabolism, (H) Coenzyme transport and metabolism, (I) Lipid transport and metabolism, (P) Inorganic ion 
transport and metabolism, (Q) Secondary metabolites biosynthesis, transport and catabolism, (R) Genera! function prediction only, and (S) Function 
unknown. Confidence scores for the interactions among the nodes (S values from STRING [81]) were divided into three groups - low (0.150-0.399), 
medium (0.400-0700) and high (0.701-0.999); the groups are represented by thin, medium and heavy lines, respectively. 
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kinase (SRPK1) (PFC0105w). PfSRPK plays a role in 
mRNA splicing machinery [62] . Gene disruption of SRPK 
in the rodent parasite P. berghei suggested that it is essen- 
tial during male gamete formation [63]. (b) myosin A 
(PF13 0233) is a component in the linear motor that pro- 
motes merozoite motility in invasion, (c) MAL7P1.132, a 
conserved Plasmodium protein of unknown function. This 
protein was recently annotated as a putative kinase [64] . 

(2) PfNek-l(PFL1370w) encodes a NIMA-related kinase 
and it is considered to be a potential antimalarial target. A 
recent study based on reverse genetics showed that it is 
required for the asexual cycle in red blood cells and it has 
sexual specificity (expression in male gametocyte) [65]. 
PfNek-1 is shown by yeast 2-hybrid assay to pool with a 
conserved hypothetical protein PFC0345w. Both proteins 
have abundant expression at the schizont stage. 

(3) cdc2-related protein kinase 4 (CRK4) (PFC0755c) 
[57], was observed as a phospho-protein in the schizont 
stage of P. falciparum-infected red blood cells. Y2H 
showed that it has a direct interaction with an AAA family 
ATPase. 

The most highly connected kinase predicted to be 
involved in the cell cycle is the serine/threonine protein 
kinase PfCLK-3 (PF11_0156) with 28 association partners. 
Ten proteins were pooled by Y2H experiments [37], 
including a rhoptry neck protein 3 (RON3), a splicing fac- 
tor 3A subunit, eukaryotic translation initiation factor 
3 subunit 10, a chloroquine resistance marker protein 
(CRMP), syntaxin involved in vesicle exocytosis, an export 
protein, and five conserved hypothetical proteins, indicat- 
ing PfCLK-3's involvement in merozoite invasion, splicing, 
translation and trafficking. Global kinome analysis sug- 
gested that PfCLK3 is likely to be essential for parasite 
schizogony in RBCs [28]. 

A calcium-dependent protein kinase 6 (PfCDPK6) 
(PF11_0239) was predicted to be involved in cell cycle reg- 
ulation by subnetwork alignment. Previous phenotypic 
analysis showed that CDPK6 plays a role in sporozoite for- 
mation and invasion of hepatocytes [66]. This kinase is 
associated with 11 other proteins verified by Y2H assays. 
Two of the association partners are likely involved in cell 
cycle regulation as well: a putative Ndc80 protein func- 
tions in spindle checkpoint signaling for kinetochore orga- 
nization and movements, and a putative Snf2-related CBP 
activator (SRCAP) for base excision repair and chromo- 
some remodeling. PfCDPK6 is also associated with PfBetl 
in SNARE complex for secretion, a putative protein loca- 
lized to rhoptry that might be related to merozoite inva- 
sion process, a liver-stage antigen, a ubiquitin domain 
containing protein, and five hypothetical proteins. 

The functional roles of other predicted kinases are largely 
unknown. PF11 0464 is a putative serine/threonine protein 
kinase. A gene disruption attempt suggested that it is likely 
essential for the parasite RBC stage [28]. This protein is 



associated with two proteins required for 60S ribosomal 
subunit biogenesis (60S ribosomal protein L6-2 and nucleo- 
lar GTP-binding protein 1), and a pseudogene of surface- 
associated interspersed gene 13.1 (SURFIN13.1), which was 
implicated in the invasion process. MAL13P1.278 (PfArk3) 
is a putative serine/threonine kinase in the aurora-related 
kinase (ARK) family. This family of kinases has been impli- 
cated in regulation of endocytosis and of the actin skeleton 
[67]. PfArk3 has a weak association with an erythrocyte 
membrane protein 1, PfEMPl (PFB1055) that may be 
related to mitotic recombination. 

3. Proteins implicated in cell division, chromosome 
organization, and DNA replication 

Our analysis has implicated a number of other predicted 
proteins in the cell division, mitosis, chromosome organi- 
zation, and DNA replication processes. PFE0450w, a puta- 
tive chromosome condensation protein that forms part of 
the ATP-dependent chromatin remodeling complex [68], 
was predicted to be associated with cell cycle regulation. 
As shown in Figure 2, are 16 proteins associated with 
PFE0450w. Eight of these associations have been verified 
by Y2H, a set that includes two tat-binding proteins perti- 
nent to proteasome activities, a pre-mRNA splicing factor, 
an eukaryotic translation initiation factor 3 subunit 10, 
and three conserved Plasmodium proteins with unknown 
function. Perhaps the most important association sug- 
gested by our analysis is its link with the high molecular 
weight rhoptry protein 2 (RhopH2). Rhop2 is localized in 
the rhoptries of schizonts and plays a role in cytoadher- 
ence and merozoite invasion of the red blood cell [69]. 
Several key components including DNA replication licen- 
sing factors and an origin recognition complex subunit 
were predicted by our subnetwork alignment. 

4. DNA repair proteins 

The cell cycle is also involved in involving DNA repair 
mechanisms that ensure genome integrity. A putative DNA 
repair protein RAD23 (PF10_0114) was predicted to have 
92 protein-protein association partners (Figure 3), 22 of 
which have been demonstrated to be direct Y2H physical 
interactions. This protein is a member of an escort complex 
for proteasome-mediated degradation of non-native ER 
proteins. Other suggested interactors with RAD23 include 
heat shock chaperone proteins, ATP-dependent proteases, 
serine-threonine kinases, and secreted proteins that have 
been implicated in stress responses, signaling cascades, and 
protein sorting and trafficking. 

5. Transcriptional regulators 

Seven parasite-specific ApiAP2 transcription factors were 
predicted to have a role in cell cycle regulation, under- 
scoring the importance of transcriptional regulation. 
ApiAP2 proteins are gaining recognition as attractive 
drug targets due to their critical roles in the parasite life 
cycle and their distant evolutionary relationship to the 
host, implying a diminished possibility of side-effects for 
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Figure 2 The proteins associated with a putative chromosome condensation protein PFE0450w. Node size is proportional to the degree 
of the connectivity of the node. The visualization is as for Figure 1. 



][ 



humans [70] . The ApiAP2 protein with the highest degree 
of connectivity in the cell cycle regulatory network is 
PFD0985w (Figure 4). Its 17 association partners play ver- 
satile roles in epigenetic regulation, kinetochore organiza- 
tion, host cell entry and adhesion, secretion, and protein 
degradation by the ubiquitin-proteasome system [45] . The 
roles of another ApiAP2 protein (PF07 0126) can be 
inferred from its associations with 15 proteins that are 
related to transcriptional regulation, chromatin remodel- 
ing, replication, and repair. This protein has interactions 
with multiple signaling molecules including a calcium- 
dependent protein kinase and a ligand protein in the 14-3- 
3 family. 

The involvement of PF10_0075 in ApiAP2 in cell cycle 
regulation is indicated by its Y2H interactions with another 
ApiAp2 protein (MAL8P1.153), a histone acetyltransferase 
GCN5 (PF08_0034), which is important for histone modifi- 
cation and chromatin remodeling [71], a DNA excision 
repair protein rhpl6 (PFL2440w), actin (PFL2215w) and a 



putative kelch protein whose ortholog was implicated in 
cytoskeletal function in Atlantic horseshoe crab, Limulus 
polyphemus [72] (Figure 4). 
6. Surface antigens 

A group of surface antigens in the Plasmodium falciparum 
erythrocyte membrane protein (PfEMPl) family (Table 1) 
were predicted to be associated with the cell cycle. 
Encoded by the var gene, PfEMPl is one of the most 
abundant protein families in P. falciparum. Its poly- 
morphic nature leads to antigenic variation, allowing the 
parasite to successfully evade the human immune systems, 
thus contributing to pathogenicity and virulence. 

Conclusions 

We have previously developed a neighborhood subnet- 
work alignment approach and here we apply this method 
to predict the network components involved in cell cycle 
regulation. The network components identified included 
cyclins, kinases, transcriptional regulators, and cell surface 
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Figure 3 The proteins associated with a putative DNA repair protein RAD23 (PF10_0114). Node size is proportional to the degree of the 
connectivity of the node. Nodes are colored according to their functional classification in the eggNOG database. The visualization is as for Figure 1. 



antigens, among others. Some of these are obvious and 
have already been confirmed by experimental approaches, 
such as yeast two-hybrid experiments. This validates our 
approach as a useful tool for in silico prediction of pre- 
viously unrecognized interactors in cell cycle regulation 
and suggests that the expanded set of interactors discussed 
here form a new set of potential targets for drugs or 
therapies. 



Methods 

Subnetwork querying by neighborhood alignments 

The prediction of functional orthologs for the P. falci- 
parum proteins has been structured as a subnetwork 
querying problem. Network Querying is a technique that 
searches a large "target" network of an organism to find 
subnetwork regions that look similar to a given query net- 
work of another organism [73,74]. The "query" network 
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Figure 4 The proteins associated with ApiAP2 transcriptional regulators. Square nodes represent the ApiAp2 proteins. Node size is 
proportional to the degree of the connectivity of the node. Nodes are colored according to their functional classification in the eggNOG 
database. The visualization is as for Figure 1. 



that we are searching against "target" network is the well- 
studied functional module in a model organism. Network 
Querying allows us to predict similar modules in the less 
studied target organism, providing a way to relate biologi- 
cal knowledge of functionality across organisms [75]. 
Previously, we applied a neighborhood alignment method 
for subnetwork querying to predict novel transcriptional 
regulators with versatile roles in the parasite life cycle [43] . 
We adopted the same method to identify proteins involved 
in cell cycle regulation. 

First a set of proteins related to cell cycle regulation 
(GO:0007049: cell cycle) in E. coli were mapped onto the 
its own PPI network. For each cell cycle protein a set of 
"neighbors" was selected, creating a subnetwork, and by 
inference, a network of subnetworks in the query network. 
Conversely using the same technique, each P. falciparum 
protein was mapped into its own PPI network, and a sub- 
network of neighbors was constructed. To construct 



neighborhood subnetworks of comparable size for align- 
ment, proteins that are k hops from the central were 
included and k was chosen such that the neighbor size was 
under 500, unless the central protein had more than 500 
neighbors. 

After obtaining the neighborhood subnetworks for both 
the E. coli cell cycle proteins and the P. falciparum pro- 
teins, the E. coli subnetworks were combinatorically 
aligned against the P. falciparum subnetworks. The central 
protein of the best-aligned P. falciparum subnetwork was 
labeled a functional ortholog of the proteins involved in 
cell cycle regulation in E. coli . 

Analysis to determine how well the P. falciparum neigh- 
borhood subnetworks aligned with the E. coli neighborhood 
subnetworks was done by assigning a numerical score for 
each alignment by a shortest-path graph kernel to measure 
the similarity between two labeled networks [76]. To 
optimize the graph kernel for this specific use case; only 
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paths between the central protein and other subnetwork 
proteins are counted. Each shortest path through the cen- 
tral protein characterizes the functional role of the protein 
in the chained molecular activities along the path. As 
shown in Figure 5, given two subnetworks S p with central 
protein p and S q with central protein q, the shortest path 
similarity function is defined as follows, 



K(S q ,S p ) 



1 



N + l s Pl v(ii,a)es, 



H B{(il,i2)S p ) 



Where 



B ((il, z'2) , S p ) = max 



2E(il,;T)£(i2,j2) 



Vfj'i,)2)es p dist(il, il) + dist(jl,j2)' 

E(x,y) = exp(— ^ m ^ X '^ ) with the normalization para- 
meter a = 10 measures the sequence similarity between 
proteins x and y based on the E-value of the sequence 
alignment, and dist(x,y) is the length of the shortest 
path connecting proteins x and y in the PPI subnetwork. 
The computation was done on a — log 10 scale. The 
method outlined here takes each pair of proteins (il, i2) 
from one subnetwork and seeks the maximum ratio of 
sequence similarity with respect to the closeness (short- 
est path through the central protein) of the networks, in 
order to identify proteins (jl,j2) in the target subnet- 
work. From this algorithm, a subnetwork alignment 
score is obtained by, collecting the shortest paths 
between two neighborhood subnetworks, getting an 
alignment score for each pair of proteins, and totaling 



all of the alignment values. This approach allows for the 
summarization of the functional coherence, and distance 
between two central proteins, into a numerical score by 
way of evaluating the sequence similarity and the role of 
the central protein between two subnetworks. 

An example of how the subnetwork alignment 
approach is used to predict functional orthologs is shown 
in Figure 6 (annotations are shown in Additional File 2). 
Although the P. falciparum protein encoded by locus 
PF08_0126 (Uniprot ID Q8IAN4, a putative DNA repair 
protein rad54) and E. coli protein DamX (P11557) showed 
no significant homology, they did share eight pairs of 
sequence and network orthologs when their PPI networks 
were aligned. DamX has been shown to directly or indir- 
ectly interfere with cell division in E. coli [77,78]. Despite 
their low sequence similarity (BLAST E-value 663), the 
network alignment evidence suggests that DamX and 
Q8IAN4 are likely to be functional orthologs. 

Data preparation and network analysis 

Protein-protein interaction data for E. coli were down- 
loaded from the IntAct database [44] . Protein association 
data for P. falciparum were extracted from the STRING 
database [45]. STRING assigns association confidence 
scores (S), ranging from 0.15 to 0.999, based on sequence 
similarity, pathway analysis [24,46], chromosome synteny, 
genome organization, phylogenetic reconstruction, and 
literature text mining. Cytoscape 2.8.3 was used for net- 
work visualization [47]. Nodes are colored according to 
their functional classification in the eggNOG database 
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Figure 5 Subnetwork alignment. See Methods section for the description of the algorithm. 
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Figure 6 An example of functional orthologs predicted by subnetwork alignment. A subnetwork alignment between E. coli (proteins 
labeled in blue) and P. falciparum (proteins labeled in red). Because the subnetworks are similar and composed almost entirely of proteins with 
low BLAST E-values, that is, homologous pairs, it is likely that Q8IAN4 and P1 1 557 are functional homologs, despite their low sequence similarity. 



[48]. NetworkAnalyzer was used to compute topological 
parameters of the networks [49], with the default settings. 
Gene Ontology (GO) enrichment analysis was conducted 
using BiNGO [50]. The hypergeometric test was used 
with the Benjamini and Hochberg false discovery rate 
(FDR) correction with a significance level of 0.05. 
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